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[Title of the Invention] 



PARALLEL PROCESSOR 
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[Claims] 



1- 



A parallel processor that performs 



parallel processing of one or more basic instructions 
contained in each of instruction words delimited by 
instruction delimiting information, said parallel 
10 processor characterized by comprising: 

a plurality of instruction execution units that 
perform processes corresponding to supplied basic 
instructions in parallel; 



15 instruction words one by one in accordance with the 
instruction delimiting information; and 

an instruction issue unit that selectively issues 
each of the basic instructions supplied from the 
instruction fetch unit to one of the instruction 

20 execution units to execute an issued basic instruction. 

2- The parallel processor according to claim 
1, characterized in that the plurality of instruction 
execution units all have the same structure. 



an instruction fetch unit that fetches the 



25 
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3- The parallel processor according to claim 1, 
characterized in that: 

at least two of the instruction execution units 
have different structures from each other; and 
5 the instruction fetch unit rearranges the basic 

instructions contained in each of the fetched 
instruction words, in accordance with arrangement of the 
plurality of instruction execution units, and then 
supplies the rearranged basic instructions to the 
10 instruction issue unit. 

4. The parallel processor according to claim 
1, characterized in that: 

at least two of the instruction execution units 
15 have different structures from each other; and 

the instruction issue unit rearranges the basic 
instructions contained in each of the instruction words 
supplied from the instruction fetch unit, in accordance 
with arrangement of the plurality of instruction 
20 execution units, and then supplies the rearranged basic 
instructions to the instruction execution units . 



25 



5. The parallel processor according to any one 
of claims 1 to 4, characterized in that: 
depending on the type of a basic instruction being 
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currently executed by one of the instruction execution 



units. 



the instruction issue unit issues a next basic 



instruction before the execution of the 



basic 



instruction being currently executed is 



completed. 
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[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] 

The present invention generally relates to 
10 processors, and, more particularly, to a parallel 
processor that executes a plurality of basic 
instructions in parallel. 
[0002] 
[Prior Art] 

15 Generally, in a conventional computer system, 

a plurality of basic instructions are executed in 
parallel by pipeline processing, thereby improving its 
performance. Conventionally, a plurality of basic 
instructions constitute a fixed-length instruction word, 

20 and a very-long instruction word (VLIW) technique is 

employed as a method for executing a plurality of basic 
instructions contained in one instruction word in 
parallel. Also, a super scalar technique may be employed. 
In accordance with the super scalar technique, basic 

25 instructions are executed in parallel depending on the 



number of basic instructions contained in each 

instruction word. 

[0003] 

FIG. 1 shows the structure of a conventional 
parallel processor 10. This parallel processor 10 
comprises an instruction fetch unit I connected to a 
memory 7, an instruction issue unit 3 connected to the 
instruction fetch unit 1, instruction execution units 
EUO to EUn each connected to the instruction issue unit 
3, and a register unit 5 connected to each of the 
instruction execution units EUO to EUn. 
[0004] 

The instruction fetch unit 1 fetches an 
instruction word from the memory 7, and supplies the 
instruction word to the instruction issue unit 3. The 
instruction issue unit 3 issues the basic instructions 
contained in the supplied instruction word to the 
instruction execution units EUO to EUn. If the 
instruction execution units EUO to EUn are still 
executing previous basic instructions at this point, the 
instruction issue unit 3 waits for the end of the 
execution. When the execution ends, the instruction 
issue unit 3 supplies the basic instructions to the 
instruction execution units EUO to EUn. 
[0005] 
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The instruction execution units EUO to EUn 
execute the basic instructions, and notify the 
instruction issue unit 3 of the end of the execution. 
The register unit 5 supplies data to the instruction 
5 execution units EUO to EUn, if necessary, and holds the 
execution results of the instruction execution units EUO 
to EUn. The externally connected memory 7 stores an 
instruction word string to be executed in the parallel 
processor 10. The memory 7 also stores necessary data 
10 for the execution units EUO to EUn to execute 

instructions, and data as the execution results. 
[0006] 

FIG. 2 shows the formats of instruction words 
to be supplied to a parallel processor having four 

15 instruction execution units EUO to EU3. As shown in FIG. 
2, each instruction word is made up of a basic 
instruction EI and a do-nothing instruction NOP. If the 
number of basic instructions contained in one 
instruction word to be executed in parallel is smaller 

20 than .the number of the instruction execution units EUO 
to EU3, the proportion of do-noting instructions is 
large . 
[0007] 

In the conventional parallel processing method 
25 of executing a plurality of basic instructions by the 
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VLIW technique, each instruction word has a fixed length. 
Therefore, if the number of basic instructions to be 
executed in parallel is smaller than a predetermined 
number, do-nothing instructions are added to comply with 
5 the predetermined length- Because of that, in a program 
having a small number of basic instructions in total, 
the proportion of do-nothing instructions is large, and 
the amount of instruction code increases accordingly, 
resulting in problems such as poor usage efficiency of 
10 memory, a decrease of the hit ratio of cache memory, and 
an increase of the load on the instruction fetch 
mechanism. 
[0008] 

With the super scalar technique, there is also 
15 a problem that a large-scale circuit is needed to 

increase the number of instructions to be executed in 
parallel . 
[0009] 

[Problems to be Solved by the Invention] 
20 A general object of the present invention is 

to provide parallel processors in which the above 
disadvantages are eliminated. 

A more specific object of the present 
invention is to provide a parallel processor that is 
25 capable of performing highly efficient parallel 
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processing . 
[0010] 

[Means to Solve the Problems] 

To achieve the objects, according to one 
5 aspect of the present invention, there is provided a 

parallel processor that performs parallel processing of 
one or more basic instructions contained in each of 
instruction words delimited by instruction delimiting 
information, the parallel processor comprising: 
10 a plurality of instruction execution units that 

perform processes corresponding to the supplied basic 
instructions in parallel; 

an instruction fetch unit that fetches the 
instruction words one by one in accordance with the 
15 instruction delimiting information; and 

an instruction issue unit that selectively issues 
each of the basic instructions supplied from the 
instruction fetch unit to one of the instruction 
execution units to execute the basic instruction . 
20 [0011] 

With the parallel processor having the above 
structure, the instruction fetch unit makes each 
instruction word length variable, so that the 
instruction words can be fetched one by one in 
25 accordance with the instruction delimiting information. 
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Also, the instruction execution units can efficiently 
execute the instruction words, because each of the basic 
instructions is selectively issued to a corresponding 
one of the instruction execution units. In the above 
5 configuration, all of the instruction execution units 
may have the same structure. 
[0012] 

According to another aspect of the present 
invention, at least two of the instruction execution 

10 units have different structures from each other and the 
instruction fetch unit may rearrange the basic 
instructions contained in each of the fetched 
instruction words, in accordance with arrangement of the 
plurality of instruction execution units, and then 

15 supplies the rearranged basic instructions to the 

instruction issue unit. According to the configuration, 
since the basic instructions are supplied to the 
instruction issue unit in accordance with the 
arrangement of the plurality of instruction execution 

20 units, the basic instructions can be effectively 
executed in parallel. 
[0013] 

According to still another aspect of the 
present invention, at least two of the instruction 
25 execution units have different structures from each 
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other and the instruction issue unit may rearrange the 
basic instructions contained in each of the instruction 
words supplied from the instruction fetch unit, in 
accordance with arrangement of the plurality of 
5 instruction execution units, and then supply the 
rearranged basic instructions to the instruction 
execution units . 

Furthermore, depending on the type of a basic 
instruction being currently executed by one of the 
10 instruction execution units, the instruction issue unit 
may issue a next basic instruction before the execution 
of a basic instruction being currently executed is 
completed. 
[0014] 

15 [Embodiments of the Invention] 

The embodiments of the present invention will 
be explained in detail below with reference to the 
accompanying drawings- Among the drawings, the same 
reference numerals are used to designate the same or 

20 corresponding components. 
[Embodiment 1] 

FIGS. 3 and 6 show parallel processors 20 and 
21 in accordance with a first embodiment of the present 
invention. The parallel processor 20 comprises an 

25 instruction fetch unit 46 connected to a memory 12, an 
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instruction issue unit 72 connected to the instruction 
fetch unit 46, two instruction execution units EUO and 
EUl having the same structure and connected to the 
instruction issue unit 72, and a register unit 98 
connected to each of the instruction execution units EUO 
and EUl- Likewise, the parallel processor 21 comprises 
an instruction fetch unit 47 connected to a memory 12, 
an instruction issue unit 73 connected to the 
instruction fetch unit 47, two instruction execution 
units EUO and EUl having the same structure and 
connected to the instruction issue unit 73, and a 
register unit 98 connected to each of the instruction 
execution units EUO and EUl. 
[0015] 

It should be noted that, in the following 
description, the maximum basic instruction length of one 
instruction word is 2. However, the parallel processor 
in accordance with the first embodiment should operate 
in the same manner in a case where the maximum basic 
instruction length in one instruction word is 3 or 
greater . 
(Example 1) 

FIG- 4 shows the structure of the instruction 
fetch unit 46 and the instruction issue unit 72. The 
instruction fetch unit 4 6 comprises a fetch program 
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counter (FPC) 300, adders 324 and 325, an instruction 
buffer 308, a cutting unit 316, and an execution program 
counter (EPC) ,339. 
[0016] 

5 The FPC 300 is connected to the memory 12 and 

the instruction execution units EUO and EUl. The adder 
324 is connected to the FPC 300. The instruction buffer 
308 is connected to the memory 12, and the cutting unit 
316 is connected to the instruction buffer 308. The 

10 adder 325 is connected to the cutting unit 316, and the 
EPC 339 is connected to the adder 325 and the register 
unit 98. The FPC 300 receives a fetch address contained 
in an instruction word from the memory 12, and the 
instruction buffer 308 receives fetch data contained in 

15 the instruction word from the memory 12. The FPC 300 
further receives a branch destination address 
corresponding to a branch instruction from the 
instruction execution units EUO and EUl. 
[0017] 

20 On the other hand, the instruction issue unit 

72 comprises an instruction register 347, selectors 355 
and 356, a control unit 370, and an AND gate 378. Here, 
the instruction register 347 is connected to the cutting 
unit 316. The selectors 355 and 356 are both connected 

25 to the instruction register 347. The selector 355 is 
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connected to the instruction execution unit EUO, while 
the selector 356 is connected to the instruction 
execution unit EUl, The control unit 370 is connected to 
the AND gate 378 and the selectors 355 and 356. The AND 
5 gate 378 is connected to the instruction execution units 
EUO and EUl. In this structure, the instruction 
execution units EUO and EUl transmit execution complete 
signals EUcO and EUcl, respectively, to the AND gate 378. 
[0018] 

10 FIG. 5 shows the formats of instruction words 

to be supplied to the parallel processors of the first 
embodiment. Each instruction word is made up of one or 
more basic instructions EI and at least one of 
instruction word delimiting fields 0 and 1. The basic 

15 instruction word length is either 1 or 2. The upper row 
of FIG. 5 indicates an instruction word having a basic 
instruction word length of 2, consisting of a basic 
instruction word made up of an instruction word 
delimiting field 0 and a basic instruction EI, and 

20 another basic instruction word made up of an instruction 
word delimiting filed 1 and a basic instruction EI. The 
lower row of FIG. 5 indicates an instruction word having 
a basic instruction word length of 1, consisting of an 
instruction word delimiting field 1 and a basic 

25 instruction EI. 
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[0019] 

The above instruction words are stored in the 
memory 12 in advance. The adder 324 in the instruction 
fetch unit 46 of the parallel processor 20 increments 
5 the address by a fixed length DISP, so that the 

instruction words can be fetched from the memory 12 in 
order. When the cutting unit 316 in the instruction 
fetch unit 4 6 fetches the instruction word of the upper 
row of FIG- 5;. for instance, it recognizes the 

10 instruction word delimiting field and the following 
basic instruction EI as one instruction word. The 
cutting unit 316 then cuts the instruction word from the 
instruction word string, and stores it in the 
instruction fetch unit 46. The adder 325 calculates the 

15 address corresponding to the basic instruction EI to be 
executed in accordance with an instruction word length 
signal SL supplied from the cutting unit 316. The 
calculated address is temporarily stored in the EPC 339. 
A return address for rerunning the basic instruction EI 

20 that is stored in the EPC 339 is supplied to the 
register unit 98. 
[0020] 

Based on the instruction word delimiting 
fields 0 and 1 contained in the instruction words 
25 supplied from the cutting unit 316, the instruction 
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issue unit 72 recognizes each basic instruction EI, and 
issues each basic instruction EI selectively to one of 
the instruction execution units EUO and EUl via the 
selectors 355 and 356. Accordingly, if a basic 
5 instruction EI following an instruction word delimiting 
field 0 is issued to the instruction execution unit EUO, 
while a basic instruction EI following an instruction 
word delimiting field 1 is issued to the instruction 
execution unit EUl. The selectors 355 and 356 are 

10 controlled by the control unit 370. When the execution 
of one instruction word is completed, the corresponding 
basic instruction EI is supplied to the instruction 
execution units EUO and EUl via the selectors 355 and 
356. 

15 [0021] 

Likewise, in a case where the instruction 
fetch unit 4 6 fetches and then supplies the instruction 
word having the basic instruction word length of 1 to 
the instruction buffer unit 308, the cutting unit 316 

20 cuts the basic instruction EI that follows the 

instruction word delimiting field 1 from the rest of the 
instruction word. The instruction register 347 then 
issues the basic instruction EI to one of the 
instruction execution units EUO and EUl. 

25 [0022] 
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The instruction word delimiting fields 0 and 1 
are both represented by one bit, but any sort of data 
can be written in those fields as long as they can 
function to delimit the instruction words. In this 
5 example, the two instruction execution units EUO and EUl 
having the same structure are employed, but it is also 
possible to employ three or more instruction execution 
units . 
[0023] 

10 As described so far, in the parallel processor 

20 of this example, the instruction fetch unit 46 
fetches instruction words one by one in accordance with 
the instruction word delimiting fields 0 and 1, so that 
the length of each of the instruction words can be made 

15 variable. The instruction issue unit 72 then issues a 
basic instruction EI to a corresponding one of the 
instruction execution units EUO and EUl. Accordingly, 
there is no need to include do-nothing instructions NOP 
in any instruction word, and basic instructions EI can 

20 be efficiently included in each instruction word. By 

executing the basic instructions EI in the instruction 
words, the parallel processing performance of the 
parallel processor can be improved. 
(Example 2) 

25 FIG. 6 shows the structure of a second example 
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of the parallel processor 21 in accordance with the 
first embodiment of the present invention. As shown in 
FIG- 6, the parallel processor 21 has the same structure 
as the parallel processor 20 shown in FIG. 3, except for 
5 a judgment unit 103 that determines whether or not each 
basic instruction EI supplied to the instruction issue 
unit 73 has data dependence or control dependence with a 
basic instruction EI being executed by one of the 
instruction execution units EUO and EUl, and whether or 
10 not each basic instruction EI shares one resource with 
another basic instruction EI being executed by one of 
the instruction execution units EUO and EUl. 
[0024] 

The judgment unit 103 compares a destination 
15 register number (write register number) defined in a 

basic instruction El in execution with a source register 
number (read register number) defined in a basic 
instruction EI to be issued to one of the instruction 
execution units EUO and EUl. If the destination register 
20 number coincides with the source register number, it is 
confirmed that there is data dependence between the two 
basic instructions EI. If the destination register 
number does not coincide with the source register number, 
it is confirmed that there is no data dependence between 
25 the two basic instructions EI, and the operation can 



[0025] 

The judgment unit 103 also determines whether 
or not the basic instruction EI in execution contains a 
branch instruction, and whether or not the basic 
instruction EI has a possibility of starting an 
irregular process such as a division by 0. If the basic 
instruction EI in execution contains a branch 
instruction or has a possibility of an irregular process 
there is control dependence between the basic 
instruction EI in execution and the basic instruction EI 
to be issued to the instruction execution unit EUO or 
EUl. If the basic instruction EI in execution neither 
contains a branch instruction nor has a possibility of 
an irregular process, it is confirmed that there is no 
control dependency between the two basic instructions EI 

Based on the contents of each basic 
instruction EI, the judgment unit 103 also compares the 
resource (the instruction execution units EUO and EUl, 
for instance) required by the basic instruction EI in 
execution with the resource required by the basic 
instruction EI to be issued. If the resource required by 
the basic instruction EI in execution is the same as the 
resource required by the basic instruction EI to be 
issued, there is resource sharing between the two basic 
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instructions EI. If the resources are different, it is 
confirmed that there is no resource sharing between the 
two basic instructions EI. 
[0026] 

5 If the basic instruction EI to be issued has 

neither data dependency nor control dependency, and 
causes no resource sharing with the basic instruction EI 
being executed by the instruction execution units EUO 
and EUl, the instruction issue unit 73 issues the basic 

10 instruction EI to a corresponding one of the instruction 
execution units EUO and EUl before the end of the 
execution. Here, the instruction issuance by the 
instruction issue unit 73 and the instruction execution 
by the instruction execution units EUO and EUl are 

15 processed by time-sharing parallel processing. 
[0027] 

On the other hand, if the basic instruction EI 
to be issued has data dependency and/or control 
dependency, and/or causes resource sharing with the 
20 basic instruction EI being executed by the instruction 

execution units EUO and EUl, the basic instruction EI is 
issued to a corresponding one of the instruction 
execution units EUO and EUl after the end of the 
execution . 

25 Although the two instruction execution units 
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EUO and EUl having the same structure are employed in 
this example, it is also possible to employ three or 
more instruction execution units. 
[0028] 

5 As described so far, the parallel processor 21 

of this example can have the same effects as the 
parallel processor 20 of Example 1, and efficiently and 
accurately performs the parallel processing of the basic 
instructions EI Thus, more reliable operations can be 

10 achieved. 

[Second Embodiment ] 

FIGS. 7, and 12 to 21 show parallel processors 
22 to 27 in accordance with a second embodiment of the 
present invention. Each of the parallel processors 22-27 

15 comprises an instruction fetch unit 48-53 connected to a 
memory 12, an instruction issue unit 74-79 connected to 
the instruction fetch unit 48-53, instruction execution 
units LUO, lUO, lUl, FUO, FUl, and BUO connected to the 
instruction issue unit 74-79, and a register unit 99 

20 connected to all the instruction execution units LUO, 
lUO, lUl, FUO, FUl, and BUO. 
[0029] 

The instruction execution unit LUO is a load 
store instruction execution unit that executes a load 
25 instruction and a store instruction. After the execution 
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of these instructions, the instruction execution unit 
LUO notifies the instruction issue unit 74-79 of the end 
of the execution. The instruction execution units lUO 
and lUl are integer arithmetic instruction execution 
units that execute integer arithmetic instructions. When 
the execution of the integer arithmetic instructions is 
completed, the instruction execution units lUO and lUl 
notify the instruction issue unit 74-79 of the end of 
the execution. 
[0030] 

The instruction execution units FUO and FUl 
are floating-point arithmetic instruction execution 
units that execute floating-point arithmetic 
instructions. When the execution of the floating-point 
arithmetic instructions is completed, the instruction 
execution units FUO and FUl notify the instruction issue 
unit 74-79 of the end of the execution. The instruction 
execution unit BUO is a branch instruction execution 
unit that executes a branch instruction. When the 
execution of the branch instruction is completed, the 
instruction execution unit BUO notifies the instruction 
issue unit 74-79 of the end of the execution. 
[0031] 

In the following examples, the maximum basic 
instruction word length contained in one instruction 
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word is 2, but the same effects can be expected in a 
case where the maximum basic instruction word length is 
3 or greater. 
(Example 1) 

5 FIG. 7 shows the structure of a first example 

of the parallel processor in accordance with the second 
embodiment of the present invention. As shown in FIG. 7, 
the parallel processor 22 comprises a conversion unit 
115 in the instruction fetch unit 48. The conversion 

10 unit 115 rearranges basic instructions contained in one 
fetched instruction word in accordance with the 
structure of the instruction execution units LUO, lUO, 
lUl, FUO, FUl, and BUO, and then supplies the rearranged 
basic instructions to the instruction issue unit 74. 

15 This rearrangement by the conversion unit 115 

facilitates the issuance of the basic instructions of 
the instruction issue unit 74. 
[0032] 

More specifically, the parallel processor of 
20 the present invention is embodied on a printed board or 
an LSI circuit. The components are arranged on a two- 
dimensional surface and connected by wires. At this 
point, the wires might cross each other. However, a 
printed board and an LSI circuit typically have a 
25 plurality of wiring layers, so that any two wires that 
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might cross each other can be arranged on two different 
wiring layers. Logically, it is possible to place wires 
in any desired arrangement. In view of the operation 
speed of the circuit, however, the above alternate 
5 wiring (arranging wires on different wiring layers) 

requires longer wires, which will decrease the operation 
speed- Therefore, it is preferable to have less 
alternate wirings. Shorter wires will facilitate the 
issuance of the basic instruction of the instruction 
10 issue unit 74, and increase the operation speed. 
[0033] 

FIG. 8 shows . the structures of the instruction 
fetch unit 48 and the instruction issue unit 74 of the 
parallel processor 22 shown in FIG. 7. The instruction 

15 fetch unit 48 and the instruction issue unit 74 have the 
same structures as the instruction fetch unit 4 6 and the 
instruction issue unit 72 shown in FIG. 4, except that 
the instruction fetch unit 48 includes the conversion 
unit 115 connected to a cutting unit 317. The 

20 instruction execution unit BUO supplies information, 

such as a branch destination address corresponding to a 
branch instruction, to a FPC 301. 
[0034] 

For simplification of the drawing, only two 
25 instruction passages from an instruction register 348 to 
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the two instruction execution units LUO and LUl are 
shown in FIG. 8. However, it should be understood that 
there are the other instruction passages to the 
instruction execution units lUl, FUO, FUl, and BUO, as 
5 shown in FIG. 7. Likewise, only two signals LUc and lUcO 
to the AND gate 37 9 is shown, but there should be the 
other passages to the AND gate 379. 
[0035] 

The parallel processor 22 of this example 

10 operates in the following manner. First, the cutting 

unit 317 of the instruction fetch unit 48 recognizes a 
basic instruction up to the instruction word delimiting 
field 1 as an instruction word, same as in the parallel 
processor according to the first embodiment and fetches 

15 instruction words one by one. The formats 13 of the 
instruction words to be supplied to the instruction 
fetch unit 48 are shown in FIG. 9. As shown in FIG. 9, 
each of the instruction words includes an instruction 
word delimiting field 0 and/or an instruction word 

20 delimiting field 1 indicating delimitation of a basic 
instruction, and one or two instructions selected from 
the group consisting of an integer arithmetic 
instruction II, a floating-point arithmetic instruction 
FI, a load store instruction LI, and a branch 

25 instruction BI. 
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[0036] 

An interface 15 for the instruction execution 
units LUG, lUO, lUl, FUO, FUl and BUO, includes 
effective bits V, information II required for executing 
5 an integer arithmetic instruction, information FI 
required for executing a floating-point arithmetic 
instruction, information LI required for executing a 
load store instruction, and information BI required for 
executing a branch instruction. The interface 15 

10 supplies the effective bit V and the information LI from 
the instruction issue unit 74 to the instruction 
execution unit LUO, the effective bit V and the 
information II to the instruction execution units lUO 
and lUl, the effective bit V and the information FI to 

15 the instruction execution units FUO and FUl, and. the 
effective bit V and the information BI to the 
instruction execution unit BUO. 
[0037] 

When the effective bit V is 0, no basic 
20 instruction is issued, and when the effective bit 1, a 
basic instruction is issued. 'Each effective bit V is 
coupled with the information II, FI, LI, or BI, and is 
then allocated to each corresponding instruction 
execution unit. 
25 As shown in FIG. 9, the instruction word 



formats 13 are rearranged and converted into instruction 
word formats 17 by the conversion unit 115 in the 
instruction fetch unit 48. The instruction word formats 
17 correspond to the instruction execution units LUO, 
lUO, lUl, FUO, FUl, and BUO, and are supplied to the 
instruction register 348 in the instruction issue unit 
74. The instruction register 348 issues basic 
instructions each having the effective bit V of 1 to 
corresponding instruction execution units. For instance, 
when the instruction word on the uppermost row of the 
instruction word format 17 is supplied to the 
instruction issue unit 74, the instruction issue unit 74 
issues the floating-point arithmetic instruction FI 
provided with "1" as the effective bit V to the 
instruction execution unit FUO, and the branch 
instruction BI also provided with "1" as the effective 
bit V to the instruction execution unit BUO. 
[0038] 

As a result, the instruction execution unit 
FUO executes the floating-point arithmetic instruction 
FI, and the instruction execution unit BUO executes the 
branch instruction BI . In this case, no basic 
instructions are executed by the other instruction 
execution units LUO, lUO, lUl, and FUl. 

FIG- 10 is a circuit diagram of the conversion 



-26- 



unit 115 shown in FIG. 8. As shown in FIG. 10, the 
conversion unit 115 comprises transmission lines LI and 
L2, BI detectors BDl and BD2, FI detectors FDl and FD2 , 
II detectors IDl and ID2, LI detectors LDl and LD2, 
5 buffers 155 to 158, AND gates 163 to 166, 185, and 186, 
exclusive OR gates 187 to 190 selectors 209 to 212, and 
OR gates 199 to 202. 
[0039] 

The transmission line LI transmits the first 
10 basic instruction contained in each instruction word, 

and the transmission line L2 transmits the second basic 
instruction contained in each instruction word. The BI 
detector BDl is connected to the transmission line LI, 
and the BI detector BD2 is connected to the transmission 
15 line L2 . The buffer 155 is connected to the BI detector 
BDl, and the AND gate 163 is connected to the BI 
detectors BDl and BD2 . The selector 209 is connected to 
the transmission lines LI and L2, the buffer 155, and 
the AND gate 163. The OR gate 199 is connected to the 
20 buffer 155 and the AND gate 163. 
[0040] 

The FI detector FDl is connected to the 
transmission line LI, and the FI detector FD2 is 
connected to the transmission line L2 . The buffer 156 is 
25 connected to the FI detector FDl, and the AND gate 164 
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is connected to the FI detectors FDl and FD2 . The two 
input terminals of the exclusive OR gate 187 are 
connected to the input node and the output node, 
respectively, of the buffer 156. The two input terminals 
5 of the exclusive OR gate 188 are connected to the output 
node of the AND gate 164 and the FI detector FD2, 
respectively- The AND gate 185 is connected to the two 
exclusive OR gates 187 and 188. The selector 210 is 
connected to the transmission lines LI and L2, the 
10 buffer 156, and the AND gate 164. The OR gate 200 is 
connected to the buffer 156 and the AND gate 164. 
[0041] 

The II detector IDl is connected to the 
transmission line LI, and the II detector ID2 is 

15 connected to the transmission line L2 . The buffer 157 is 
connected to the II detector IDl, and the AND gate 165 
is connected to the II detectors IDl and ID2. The two 
input terminals of the exclusive OR gate 189 are 
connected to the input node and the output node, 

20 respectively, to the buffer 157. The two input terminals 
of the exclusive OR gate 190 are connected to the output 
node of the AND gate 165 and the II detector ID2, 
respectively. The AND gate 186 is connected to the two 
exclusive OR gates 189 and 190. The selector 211 is 

25 connected to the transmission lines LI and L2, the 
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buffer 157, and the AND gate 165. The OR gate 201 is 
connected to the buffer 157 and the AND gate 165. 
[0042] 

The LI detector LDl is connected to the 
5 transmission line Ll, and the LI detector LD2 is 

connected to the transmission line L2 . The buffer 158 is 
connected to the LI detector LDl, and the AND gate 166 
is connected to the LI detectors LDl and LD2 . The 
selector 212 is connected to the transmission lines LI 
10 and L2, the buffer 158, and the AND gate 166. The OR 

gate 202 is connected to the buffer 158 and the AND gate 

166. 

[0043] 

The two BI detectors BDl and BD2 constitute a 
15 BI detector block 147. The two FI detectors FDl and FD2 
constitute an FI detector block 149. The two II 
detectors IDl and ID2 constitute an II detector block 
151. The two LI detectors LDl and LD2 constitute an LI 
detector block 153. 
20 [0044] 

In the following, an operation of the 
conversion unit 115 having the above structure will be 
described by way of an example case where the 
instruction word including the basic instructions BI and 
25 FI on the uppermost row of the instruction word formats 
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13 shown in FIG. 9 is supplied to the conversion unit 
115. First, the basic instruction BI is transmitted 
through the transmission line LI. The BI detector BDl 
then detects the basic instruction BI and supplies a 
5 detection signal of logic 1 to the buffer 155. At this 
point, the AND gate 163 outputs a logic 0 signal. In 
accordance with the detection signal supplied from the 
buffer 155, the selector 209 selects the first basic 
instruction BI and outputs the first basic instruction 

10 BI, that is, an instruction to be executed by the 

instruction execution unit BUO, to the instruction issue 
unit 74. At the same time as the output of the basic 
instruction BI, in accordance with the detection signal 
supplied from the buffer 155, the OR gate 199 outputs 

15 the effective bit V of logic 1. As the first basic 

instruction BI is detected, the FI detector FDl, the II 
detector IDl, and the LI detector LDl output non- 
detection signals of logic 0. Accordingly, the selectors 
210, 211, and 212 do not select the first basic 

20 instruction transmitted through the transmission line LI. 
[0045] 

Next, the second basic instruction FI in the 
instruction word is transmitted through the transmission 
line L2 . As in the case of the first basic instruction 
25 BI, The FI detector FD2 detects the second basic 
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instruction FI and supplies a detection signal of logic 
1 to the AND gate 164. The AND gate 164 in turn outputs 
a logic 1 signal. In accordance with the logic 1 signal 
supplied from the AND gate 164, the selector 210 selects 
5 the second basic instruction FI and outputs the second 
basic instruction FI as an instruction to be executed by 
the instruction execution unit FUO. At the same time as 
the output of the basic instruction FI, the OR gate 200 
outputs the effective bit V of logic 1 in accordance 
10 with the detection signal supplied from the AND gate 164. 
[0046] 

As the second basic instruction FI is detected, 
the BI detector BD2, the II detector ID2, and the LI 
detector LD2 output non-detection signals of logic 0. 

15 Accordingly, the selectors 209, 211, and 212 do not 

select the second basic instruction transmitted through 
the transmission line L2 . Since neither first nor second 
basic instructions to be executed by the instruction 
executed units LUO, lUO, lUl, and FUl are detected, the 

20 effective bit V of logic 0 is outputted from each of the 
OR gates 201 and 202, and the AND gates 185 and 186. 
[0047] 

In the above described manner, the conversion 
unit 115 converts the instruction word formats 13 into 
25 the instruction word formats 17, as shown in FIG. 9. 
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FIG. 11 is a circuit diagram of the conversion 
unit 115 in a case where the maximum basic instruction 
word length of one instruction word to be supplied from 
the memory 12 to the instruction fetch unit 48 is 4. As 
5 shown in FIG. 11, the structure of the conversion unit 
115 in this case is the same . as the structure of the 
conversion unit 115 shown in FIG. 10, except that the 
number of transmission lines are 4, the number of BI 
detectors is 4, the number of FX detectors is 4, the 

10 number of II detectors is 4, and the number of LI 

detectors is 4. Also, two selectors 214 and 215 are 
provided for a basic instruction FI, and two selectors 
216 and 217 are provided for a basic instruction II in 
this case. 

15 [0048] 

The conversion unit 115 further includes 
buffers 159 to 162, AND gates 167 to 184, exclusive OR 
gates 191 to 198, OR gates 203 to 208, and selectors 213 
and 218. The four BI detectors BDl to BD4 constitute a 

20 BI detector block 148. The four FI detectors FDl to FD4 
constitute an FI detector block 150. The four II 
detectors IDl to ID4 constitute an ID detector block 152. 
The four LI detectors LDl to LD4 constitute an LI 
detector block 154, 

25 [0049] 
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The conversion unit 115 having the above 
structure operates in the same manner as the conversion 
unit 115 shown in FIG. 10. In the following, an 
operation of the conversion unit 115 in a case where an 
5 instruction word made up of basic instructions BI, FI, 
FI, and II is supplied to the conversion unit 115 will 
be described. First, the first basic instruction BI is 
transmitted through the transmission line LI. The BI 
detector BDl then detects the basic instruction BI and 

10 supplies a detection signal of logic 1 to the buffer 159. 
At this point, each of the AND gates 167 to 169 outputs 
a logic 0 signal. In accordance with the detection 
signal supplied from the buffer 159, the selector 213 
selects the first basic instruction BI and outputs the 

15 first basic instruction BI, that is an instruction to be 
executed by the instruction execution unit BUO, to the 
instruction issue unit 74. At the same time as the 
output of the first basic instruction BI, the OR gates 
203 outputs the effective bit V of logic 1 in accordance 

20 with the detection signal supplied from the buffer 159. 
As the first basic instruction BI is detected, the FI 
detector FDl, the II detector IDl, and the LI detector 
LDl output non-detection signal of logic 0. Accordingly, 
the selectors 214, 216, and 218 do not select the first 

25 basic instruction BI transmitted through the 
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transmission line LI. 
[0050] 

Next, the second basic instruction FI is 
transmitted on the transmission line L2 . The FI detector 
5 FD2 then detects the second basic instruction FI and 
supplies a detection signal of logic 1 to the AND gate 
170. The AND gate 170 in turn outputs a logic 1 signal. 
In accordance with the logic 1 signal supplied from the 
AND gate 170, the selector 214 selects the second basic 

10 instruction FI and outputs the second basic instruction 
FI as an instruction to be executed by the instruction 
execution unit FUO. At the same time as the output of 
the second basic instruction FI, the OR gate 204 outputs 
the effective bit V of logic 1 in accordance with the 

15 detection signal supplied from the AND gate 170. 
[0051] 

As the second basic instruction FI is detected, 
the BI detector BD2, the II detector ID2, and the LI 
detector LD2 each output a non-detection signal of logic 
20 0- Accordingly, the selectors 213, 216, and 218 do not 
select the second basic instruction FI transmitted 
through the transmission line L2 . 

Next, the third basic instruction FI is 
transmitted through the transmission line L3 . The FI 
25 detector FD3 then detects the third basic instruction FI 
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and supplies a detection signal of logic level 1 to the 
AND gate 171. Since the AND gate 171 has already- 
received a detection signal of logic 1 from the FI 
detector FD2 at this point, the output of the AND gate 
5 171 is a logic 0 signal. Because of that, the exclusive 
OR gate 193 outputs a logic 1 signal, and the AND gate 
174 also outputs a logic 1 signal. In accordance with 
the logic 1 signal supplied from the AND gate 174, the 
selector 215 selects the third basic instruction FI and 

10 outputs the third basic instruction FI as an instruction 
to be executed by the instruction execution unit FUl. At 
the same time as the output of the third basic 
instruction FI, the OR gate 205 outputs the effective 
bit V of logic 1 in accordance with the signal supplied 

15 from the AND gate 174. 
[0052] 

As the third basic instruction FI is detected, 
the BI detector BD3, the II detector ID3, and the LI 
detector LD3 each output a non-detection signal of logic 
20 0- Accordingly, the selectors 213, 216, and 218 do not 
select the third basic instruction FI transmitted 
through the transmission line L3 . 

Next, the fourth basic instruction II of the 
instruction word is transmitted through the transmission 
25 line L4 . The II detector ID4 then detects the fourth 
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basic instruction II and supplies a detection signal of 
logic 1 to the AND gate 178. The AND gate 178 in turn 
outputs a logic 1 signal. In accordance with the logic 1 
signal supplied from the AND gate 178, the selector 216 
5 selects the fourth basic instruction II and outputs the 
fourth basic instruction II as an instruction to be 
executed by the instruction execution unit lUO. At the 
same time as the output of the fourth basic instruction 
II, the OR gate 206 outputs the effective bit V of logic 
10 1 in accordance with the signal supplied from the AND 
gate 178. 
[0053] 

As described above, in the parallel processor 
of this example, basic instructions contained in each 

15 instruction word supplied to the instruction fetch unit 
48 are rearranged in accordance with the arrangement of 
the instruction execution units, so that the instruction 
issue unit 74 can smoothly issue the basic instructions 
to the respective instruction execution units. Thus, the 

20 entire operation speed can be increased. 

In this example, the instruction fetch unit 48 
can also fetch an instruction word containing basic 
instructions that have already been arranged in 
accordance with the arrangement of the instruction 

25 execution units in advance- In such a case, the basic 
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instruction are arranged in advance so that the circuit 
size required for rearranging the basic instructions in 
the instruction fetch unit 48 can be reduced. 
[0054] 

5 More specifically, when there are two 

instructions for the same function, only one of the two 
instructions is employed- For instance, the instruction 
word on the uppermost row and the instruction word on 
the fourth row from the top of the formats 13 in FIG. 9 

10 are rearranged into the same formats in the formats 17. 
In this case, only one of the two instruction words 
should be employed, while the use of the other should be 
inhibited. Alternatively, an instruction word that will 
increase the number of alternate wire routes in the 

15 instruction fetch unit 48 may be inhibited beforehand. 
For instance, the instruction words on the upper most 
row and the fourth row from the top of the formats 13 in 
FIG. 9 have the basic instructions BI and FI in the 
opposite orders. Since the circuit components are 

20 arranged on a two-dimensional surface, one of the two 
basic instructions requires more alternate wire routes 
than the other. Accordingly, the instruction word that 
requires more alternate wire" routes should be inhibited 
in advance. 

25 [0055] 
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As described so far, the circuit size of the 
parallel processor 22 can be reduced by restricting in 
advance the arrangement of basic instruction contained 
in each instruction word to be supplied to the 
5 instruction fetch unit 48. 
(Example 2) 

FIG. 12 shows the structure of a second 
example of the parallel processor in accordance with the 
second embodiment of the present invention. As shown in 

10 FIG. 12, the parallel processor 23 of this example has 
the same structure as the parallel processor 22 of 
Example 1, except that a conversion unit 116 is included 
in the instruction issue unit 75. The conversion unit 
116 has the same structure and functions as the 

15 conversion unit 115 shown in FIGS. 10 and 11. 
[0056] 

FIG. 13 shows the structures of the 
instruction fetch unit 49 and the instruction issue unit 
75 of the parallel processor 23 shown in FIG. 12. The 

20 instruction fetch unit 4 9 and the instruction issue unit 
75 has the same structures as the instruction fetch unit 
46 and the instruction issue unit 72 shown in FIG. 4, 
except that the instruction issue unit 75 includes the 
conversion unit 116 connected to an instruction register 

25 349. For simplification of the drawing, only the 
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instruction passages to the two instruction execution 
units LUO and lUO are shown, and the instruction 
passages to the other instruction execution units lUl, 
FUO, FUl, and BUO are omitted in FIG. 13. Also, only two 
5 execution complete signals LUc and lUcO to be supplied 
to the AND gate 380 are shown, and the other execution 
complete signals are omitted in FIG. 13. 
[0057] 

With the parallel processor of this example, 
10 basic instructions contained in each instruction word 
supplied from the instruction register 349 are 
rearranged by the conversion unit 116 in accordance with 
the arrangement of the instruction execution units. The 
rearranged basic instructions are then issued to the 
15 corresponding instruction execution units. Thus, the 
wires can be shortened as a whole, and the operation 
speed can be increased. 
[0058] 

Also, the arrangement of basic instruction 
20 contained in each instruction word to be supplied to the 
instruction fetch unit 49 can be restricted in advance 
in the same manner as in Example 1. Thus, the circuit 
size of the parallel processor 23 can be reduced. 
(Example 3) 

25 FIG. 14 shows the structure of a third example 
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of the parallel processor in accordance with the second 
embodiment of the present invention. As shown in FIG. 14, 
the parallel processor 24 has the same structure as the 
parallel processor 22 of Example 1 shown in FIG. 1, 
5 except that the instruction fetch unit 50 includes a 

first conversion unit 117 and the instruction issue unit 
76 includes a second conversion unit 118. The first 
conversion unit 117 rearranges basic instructions 
contained in each instruction word in accordance with 
10 the arrangement of the instruction execution units. The 
second conversion unit 118 rearranges basic instructions 
contained in each instruction word in accordance with 
the arrangement of the instruction execution units . 
[0059] 

15 FIG. 15 shows the structures of the 

instruction fetch unit 50 and the instruction issue unit 
76 of the parallel processor unit 24 shown in FIG. 14. 
The instruction fetch unit 50 and the instruction issue 
unit 76 have the same structures as the instruction 

20 fetch unit 46 and the instruction issue unit 72 shown in 
FIG. 4, except that the instruction fetch unit 50 
further includes the first conversion unit 117 connected 
to a cutting unit 319 and the instruction issue unit 76 
further includes the second conversion unit 118 

25 connected to an instruction register 350. For 
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simplif ication of the drawing, only the instruction 
passages from the second conversion unit 118 to the two 
instruction execution units LUO and lUO are shown, and 
the instruction passages to the other instruction 
5 execution units lUl, FUO, FUl, and BUO are omitted in 
FIG- 15. Likewise, only two execution complete signals 
LUc and lUcO to be supplied to the AND gate 381 are 
shown, and the other execution complete signals are 
omitted in FIG. 15, 
10 [0060] 

The first conversion unit 117 performs 
"preprocessing" of the rearrangement of basic 
instructions. The second conversion unit 118 performs 
"postprocessing" of the rearrangement of basic 

15 instructions. 

In an actual circuit, the processes performed 
by the instruction fetch unit 50 and the instruction 
issue unit 7 6 are pipelined so as to improve the 
performance of the parallel processor. Because of that, 

20 the difference in processing time between instruction 
fetch unit 50 and the instruction issue unit 76 should 
be as small as possible to optimize the pipeline effects. 
Therefore, the arrangement process is divided into the 
"preprocessing" and "postprocessing", so that the 

25 difference in processing time between the instruction 



-41- 



fetch unit 50 and the instruction issue unit 76 is small. 
[0061] 

More specifically, the first conversion unit 
117 includes circuits that are the counterparts of the 
5 BI detector block 147 or 148, the FI detector block 149 
or 150, the II detector block 151 or 152, and the LI 
detector block 153 or 154 shown in FIGS, 10 and 11. The 
other circuits shown in FIGS. 10 and 11 are included in 
the second conversion unit 118. 
10 [0062] 

With the parallel processor 24 having the 
above structure, the wires can be shortened as a whole, 
and the operation speed can be reduced. Also, as in 
Examples 1 and 2, the circuit size of the parallel 
15 processor 24 may be reduced by restricting in advance 

the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 50- 
( Example 4) 

20 FIG. 16 shows the structure of a fourth 

example of the parallel processor in accordance with the 
second embodiment of the present invention. As shown in 
FIG. 16, the parallel processor 25 has the same 
structure as the parallel processor 22 of Example 1 

25 shown in FIG. 7, except that the instruction fetch unit 



51 includes a conversion unit 119 and the instruction 

issue unit 77 includes a judgment unit 104. 

[0063] 

FIG. 17 shows the structures of the 
instruction fetch unit 51 and the instruction issue unit 
77 of the parallel processor 25 shown in FIG. 16. The 
instruction fetch unit 51 and the instruction issue unit 
77 have the same structures as the instruction fetch 
unit 48 and the instruction issue unit 74 shown in FIG. 
8^ except that the instruction issue unit 77 further 
includes the judgment unit 104. The judgment unit 104 
determines whether or not a basic instruction to be 
issued has data dependency or control dependency with a 
supplied basic instruction. The judgment unit 104 also 
determines whether or not the basic instruction to be 
issued shares resources with the supplied basic 
instruction. If the basic instruction to be issued has 
data dependency or control dependency, or shares 
resources with the supplied basic instruction, the 
instruction issue unit 77 issues the basic instruction 
after the execution complete signals LUc and lUcO are 
supplied- 
[0064] 

For simplification of the drawing, only the 
instruction passages from an instruction register 351 to 
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the two instruction execution units LUO and lUO are 
shown, and the other instruction passages to the 
instruction execution units lUl, FUO, FUl, and BUO are 
omitted in FIG. 17. Likewise, only the two execution 
5 complete signals LUc and lUcO are shown as signals to be 
supplied to the judgment unit 104, but the other 
execution complete signals are omitted in FIG, 17. 
[0065] 

The structure and operation of the conversion 

10 unit 119 are substantially the same as the structure and 
operation of the conversion unit 15 shown in FIGS- 10 
and 11. The structure and operation of the judgment unit 
104 are substantially the same as the structure and 
operation of the judgment unit 103 shown in FIG. 6. 

15 By the parallel processor of this example 

having the above structure, the same effects as obtained 
by the parallel processor of Example 2 of the first 
embodiment and the parallel processor of Example 1 of 
the second embodiment can be obtained. In the parallel 

20 processor of this example, the instruction issue unit 77, 
which includes the judgment unit 104, enables accurate 
and efficient parallel processing of basic instructions, 
thereby increasing the reliability of the parallel 
processor. Also, the instruction fetch unit 51, which 

25 includes the conversion unit 119, facilitates the basic 
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instruction issuance to the instruction execution units 
by the instruction issue unit 77, thereby increasing the 
operation speed. 
[0066] 

5 As in the foregoing examples, the circuit size 

of the parallel processor 25 may be reduced by 
restricting in advance the arrangement of basic 
instructions contained in each instruction word to be 
supplied to the instruction fetch unit 51. 

10 (Example 5) 

FIG- 18 shows the structure of a fifth example 
of the parallel processor in accordance with the second 
embodiment of the present invention. As shown in FIG. 18, 
the parallel processor 26 has the same structure as the 

15 parallel processor 25 of Example' 4, except that the 

instruction fetch unit 52 includes no conversion unit 
and the instruction issue unit 78 further includes a 
conversion unit 120. 
[0067] 

20 FIG. 19 shows the structures of the 

instruction fetch unit 52 and the instruction issue unit 
78 of the parallel processor 26 shown in FIG. 18. The 
instruction fetch unit 52 and the instruction issue unit 
78 have the same structures as the instruction fetch 

25 unit 49 and the instruction issue unit 75 shown in FIG. 
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13, except that the instruction issue unit 78 further 
includes the judgment unit 105 connected between an 
instruction register 352 and a control unit 375- In 
accordance with a supplied basic instruction, the 
5 judgment unit 105 determines whether or not a basic 
instruction to be issued has the data dependency_ or 
control dependency, and whether or not the basic 
instruction to be issued will cause resource sharing. 
The judgment results are reported to the control unit 

10 375. If the basic instruction to be issued has the data 
dependency or control dependency, or causes resource 
sharing, the issue instruction unit 78 issues the basic 
instruction after the supply of the execution complete 
signals LUc and lUcO . 

15 [0068] 

For simplification of the drawing, only the 
instruction passages from the instruction register 352 
to the two instruction execution units LUO and lUO are 
shown, and the instruction passages to the other 
20 instruction execution units are omitted in FIG. 19. 

Likewise, only the two execution complete signals LUc 
and lUcO to be supplied to the judgment unit 105 are 
shown in FIG. 19. 
[0069] 

25 The structure and operation of the conversion 
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unit 120 are the same as the structure and operation of 
the conversion unit 115 shown in FIGS. 10 and 11. The 
structure and operation of the judgment unit 105 are the 
same as the structure and operation of the judgment unit 
5 104 shown in FIG. 16. 

The parallel processor of this example having 
the above structure achieves the same effects as the 
parallel processor of Example 4 . The instruction issue 
unit 78 including the judgment unit 105 enables accurate 

10 and efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
Also, the instruction issue unit 78, which further 
includes the conversion unit 120, facilitates the 
issuance of basic instructions to the instruction 

15 execution units and increases the operation speed. 
[0070] 

Additionally, the circuit size of the parallel 
processor 26 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
20 instruction word to be supplied to the instruction fetch 
unit 52, as in the foregoing 35 examples. 
(Example 6) 

FIG. 20 shows the structure of a sixth example 
of the parallel processor in accordance with the second 
25 embodiment of the present invention. As shown in FIG. 20, 
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the parallel processor 27 has the same structure as the 
parallel processor 24 of Example 3 shown in FIG. 14, 
except that the instruction issue unit 79 further 
includes a judgment unit 106. 
5 [0071] 

FIG- 21 shows the structures of the 
instruction fetch unit 53 and the instruction issue unit 
79 of the parallel processor 27 shown in FIG. 20. The 
instruction fetch unit 53 and the instruction issue unit 

10 79 have the same structures as the instruction fetch 

unit 50 and the instruction issue unit 76 shown in FIG. 
15, except that the instruction issue unit 7 9 further 
includes the judgment unit 106 connected between an 
instruction register 353 and a control unit 376. Based 

15 on a supplied basic instruction, the judgment unit 106 
determines whether or not a basic instruction to be 
issued has the data dependency or control dependency, or 
causes resource sharing. The judgment results area 
reported to the control unit 376. If the basic 

20 instruction to be issued has the data dependency or 
control dependency, or causes resource sharing, the 
instruction issue unit 79 issues the basic instruction 
only after the execution complete signals LUc and lUcO 
are supplied. 

25 [0072] 
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For simplification of the drawing, only the 
instruction passages from the instruction register 353 
to the two instruction execution units LUO and lUO are 
shown, and the instruction passages to the other 
5 instruction execution units lUl, FUO, FUl. , and BUO are 
omitted in FIG. 21. Likewise, only the two execution 
complete signals LUc and lUcO are shown in FIG. 21. 
[0073] 

The structures and operations of a first 

10 conversion unit 121 and a second conversion unit 122 are 
the same as the structures and operations of the first 
conversion unit 117 and the second conversion unit 118. 
The structure and operation of the judgment unit 106 are 
the same as the structure and operation of the judgment 

15 unit 103 shown in FIG. 6. 

The parallel processor 27 of this example 
having the above structure can achieve both effects of 
the parallel processor of Example 2 of the first 
embodiment and the parallel processor of Example 3 of 

20 the second embodiment. More specifically, the 

instruction issue unit 79 including the judgment unit 
106 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 
reliability of the operation. Also, the instruction 

25 fetch unit 53 including the first conversion unit 121 
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and the instruction issue unit 79 including the second 
conversion unit 122 facilitate the issuance of basic 
instructions from the instruction issue unit 79 to the 
instruction execution units and increase the operation 
5 speed. 
[0074] 

Additionally, the circuit size of the parallel 
processor 27 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 

10 instruction word to be supplied to the instruction fetch 
unit 53, as in the foregoing examples. 
[Third Embodiment] 

As shown in FIGS. 22 to 27, parallel 
processors 28 to 33 in accordance with a third 

15 embodiment of the present invention each comprises an 

instruction fetch unit 54-59 connected to the memory 12, 
an instruction issue unit 80-85 connected to the 
instruction fetch unit 54-59, instruction execution 
units LUO, lUO, lUl, FUO, FUll, MUO, MUl, and BUO, and a 

20 register unit 100 connected to all the instruction 

execution units. Here, the instruction execution units 
MUO and MUl are special-purpose arithmetic instruction 
execution units that execute special-purpose arithmetic 
instructions. When the execution of special-purpose 

25 arithmetic instructions is completed, the instruction 
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execution units MUO and MUl notify the instruction issue 
unit 80-85 of the complete of the execution. 
[0075] 

In the following, the parallel processors in 
5 accordance with the third embodiment of the present 

invention will be described by way of a case where the 
maximum basic instruction word length contained in one 
instruction word is 2. It should be understood that the 
same effects can be obtained in a case where the maximum 
10 instruction word length contained in one instruction 
word is equal to or more than 3. 
(Example 1) 

FIG. 22 shows the structure of a first example 
of the parallel processor in accordance with the third 

15 embodiment of the present invention. As shown in FIG. 22, 
the parallel processor 28 comprises a conversion unit 
123 in the instruction fetch unit 54. The structure and 
the operation of the conversion unit 123 are the same as 
the conversion unit 115 of Example 1 of the second 

20 embodiment. More specifically, the conversion unit 123 
rearranges basic instructions contained in each 
instruction word in accordance with the arrangement of 
the instruction execution units, and then supplies the 
rearranged basic instructions to the instruction issue 

25 unit 80. 
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COOTS] 

The parallel processor 28 having the above 
structure can achieve the same effects as the parallel 
processor 22 of Example 1 of the second embodiment- In 
5 other words, the issuance of basic instructions from the 
instruction issue unit 80 to the instruction execution 
units can be facilitated, and the operation speed can be 
increased. 
[0077] 

10 Additionally, the circuit size of the parallel 

processor 28 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction 
execution units, as in the foregoing examples. 

15 (Example 2) 

FIG. 23 shows the structure of a second 
example of the parallel processor in accordance with the 
third embodiment of the present invention. As shown in 
FIG. 23, the parallel processor 29 has the same 

20 structure as the parallel processor 23 shown in FIG. 12, 
comprising a conversion unit 124 in the instruction 
issue unit 81. The structure and operation of the 
conversion unit 124 are the same as the structure and 
operation of the conversion unit 115 shown in FIGS. 10 

25 and 11. 
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[0078] 

In the parallel processor 29 of this example, 
the instruction issue unit 81 issues each basic 
instruction to the corresponding one of the instruction 
5 execution units, only after the conversion unit 124 

rearranges the basic instructions, which are contained 
in each instruction word supplied from the instruction 
fetch unit 55, in accordance with the arrangement of the 
instruction execution units. Thus, wires can be 

10 shortened as a whole, and the operation speed can be 
increased- Additionally, the circuit size of the 
parallel processor 29 may be reduced by restricting in 
advance the arrangement of basic instructions contained 
in each instruction word to be supplied to the 

15 instruction fetch unit 55, as in the foregoing 35 
examples • 
(Example 3) 

FIG. 24 shows the structure of a third example 
of the third parallel processor in accordance with the 

20 third embodiment of the present invention. As shown in 

FIG. 24, the parallel processor 30 has substantially the 
same structure as the parallel processor 24 shown in FIG. 
14. The instruction fetch unit 56 includes a first 
conversion unit 125 that rearranges basic instructions 

25 contained in each fetched instruction word in accordance 



with the arrangement of the instruction execution units. 
The instruction issue unit 82 includes a second 
conversion unit 126 that further rearranges basic 
instructions contained in each instruction word supplied 
from the instruction fetch unit 56 in accordance with 
the arrangement of the instruction execution units . 
[0079] 

The first conversion unit 125 performs 
^'preprocessing" of rearrangement of basic instructions, 
and the second conversion unit 126 performs 
^^postprocessing" of basic instructions. 

In an actual circuit, the processes in the 
instruction fetch unit 56 and the instruction issue unit 
82 are pipelined in order to improve the performance of 
the parallel processor. Because of that, the difference 
in processing time between instruction fetch unit 56 and 
the instruction issue unit 82 should be as small as 
possible to optimize the pipeline effects. Therefore, 
the arrangement process is divided into the 
^^preprocessing" and "postprocessing", so that the 
difference in processing time between the instruction 
fetch unit 56 and the instruction issue unit 82 is small 
[0080] 

By the parallel processor of this example 
having the above structure, wires can be shortened as a 
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whole, and the operation speed can be increased. 

Additionally, the circuit size of the parallel 

processor 30 may be reduced by restricting in advance 

the arrangement of basic instructions contained in each 
5 instruction word to be supplied to the instruction fetch 

unit 56, as in the foregoing examples. 

(Example 4) 

FIG. 25 shows the structure of a fourth 

embodiment of the parallel processor in accordance with 
10 the third embodiment of the present invention. As shown 

in FIG. 25, the parallel processor 31 has the same 

structure as the parallel processor 25 shown in FIG. 16. 

The instruction fetch unit 57 includes a conversion unit 

127, and the instruction issue unit 83 includes a 
15 judgment unit 107. 

[0081] 

The structure and operation of the conversion 
unit 127 are the same as the structure and operation of 
the conversion unit 115 shown in FIGS. 10 and 11. The 
20 structure and operation of the judgment unit 107 are the 
same as the structure and operation of the judgment unit 
103 shown in FIG. 6. 

By the parallel processor of this example 
having the above structure, the same effects as the 
25 parallel processor of Example 4 of the second embodiment 
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can be obtained. More specifically, the instruction 
issue unit 83 including the judgment unit 107 enables 
accurate and efficient parallel processing of basic 
instructions, thereby increasing the reliability of the 
5 operation. The instruction fetch unit 57 including the 
conversion unit 127 facilitates the issuance of basic 
instructions to the instruction execution units, thereby 
increasing the operation speed. 
[0082] 

10 Additionally, the circuit size of the parallel 

processor 31 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 57, as in the foregoing examples . 

15 (Example 5) 

FIG. 26 shows the structure of a fifth example 
of the parallel processor in accordance with the third 
embodiment of the present invention. As shown in FIG. 26, 
the parallel processor 32 has the same structure as the 

20 parallel processor 26 of Example 5 of the second 

embodiment shown in FIG. 18. The instruction issue unit 
84 includes a conversion unit 128 and a judgment unit 
108. 
[0083] 

25 The structure and operation of the conversion 
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unit 128 are the same as the structure and operation of 
the conversion unit 115 shown in FIGS. 10 and 11. The 
structure and operation of the judgement unit 108 are 
the same as the structure and operation of the judgment 
5 unit 103. 

By the parallel processor of this example 
having the above structure, the same effects as the 
parallel processor 26 of Example 5 of the second 
embodiment can be achieved. More specifically, the 

10 instruction issue unit 84 including the judgment unit 
108 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 
reliability of the operation. The instruction issue unit 
84 further including the conversion unit 128 facilitates 

15 the issuance of basic instructions to the instruction 

execution units, thereby increasing the operation speed. 
[0084] 

Additionally, the circuit size of the parallel 
processor 32 may be reduced by restricting in advance 
20 the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 58, as in the foregoing examples. 
(Example 6) 

FIG. 27 shows the structure of a sixth example 
25 of the parallel processor in accordance with the third 
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embodiment of the present invention. As shown in FIG. 27, 
the parallel processor 33 has the same structure as the 
parallel processor 27 as shown in FIG. 20. 
[0085] 

5 The structures and operations of a first 

conversion-unit 129 and a second conversion unit 130 are 
the same as the structures and operations of the first 
conversion unit 117 and the second conversion unit 118 
shown in FIG. 14. The structure and operation of a 

10 judgment unit 109 are the same as the structure and 
operation of the judgment unit 103. 

By the parallel processor of this example 
having the above structure, the same effects as obtained 
by the parallel processor 27 of Example 6 of the second 

15 embodiment can be obtained. More specifically, the 

instruction issue unit 85 including the judgment unit 
109 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 

20 59 including the first conversion unit 129 and the 
instruction issue unit 85 including the second 
conversion unit 130 facilitate the issuance of basic 
instructions from the instruction issue unit 85 to the 
instruction execution units, thereby increasing the 

25 operation speed. 
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[0086] 

Additionally, the circuit size of the parallel 
processor 33 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
5 instruction word to be supplied to the instruction fetch 
unit 59, as in the foregoing examples. 
[ Fourth Embodiment ] 

As shown in FIGS. 28 to 33, a parallel 
processor 34-39 in accordance with a fourth embodiment 

10 of the present invention each comprises an instruction 
fetch unit 60-65 connected to the memory 12, an 
instruction issue unit 86-91 connected to the 
instruction fetch unit 60-65, instruction execution 
units LUO, LUl, lUO, lUl, FUO, FUl, BUO, and BUI 

15 connected to the instruction issue unit 86-91, and a 
register unit 101 connected to all the instruction 
execution units. In this embodiment, the instruction 
execution unit LUl is a load store instruction execution 
unit that executes load instructions and store 

20 instructions. The instruction execution unit BUI is a 
branch instruction execution unit that executes branch 
instructions- When the execution is completed, the 
instruction execution unit BUI notifies the instruction 
issue unit 86-91 of the end of the execution. 

25 [0087] 
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In the following, the parallel processor in 
accordance with the fourth embodiment of the present 
invention will be described by way of examples in which 
the maximum basic instruction word length contained in 
each one basic instruction is 4. In FIGS, 28 to 33, the 
maximum basic instruction word length being 4 is 
indicated by four arrows from the instruction fetch unit 
60-65 to the instruction issue unit 86-91. However, it 
should be understood that the maximum basic instruction 
word length in the fourth embodiment is not limited to 4. 
(Example 1) 

FIG- 28 shows the structure of a first example 
of the parallel processor in accordance with the fourth 
embodiment of the present invention. As shown in FIG. 28, 
the parallel processor 34 comprises a conversion unit 
131 in the instruction fetch unit 60. The structure and 
operation of the conversion unit 131 are the same as the 
structure and operation of the conversion unit 115 of 
Example 1 of the second embodiment. More specifically, 
the conversion unit 131 rearranges basic instructions 
contained in each fetched instruction word, in 
accordance with the arrangement of the instruction 
execution units, and supplies the rearranged basic 
instructions to the instruction issue unit 86. 
[0088] 
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By the parallel processor 34 having the above 
structure, the same effects as obtained by the parallel 
processor 22 of Example 1 of the second embodiment can 
also be obtained. 
5 More specifically, the issuance of basic 

instructions from the instruction issue unit 8 6 to the 
instruction execution units can be facilitated, and the 
operation speed can be increased accordingly. 
[0089] 

10 Additionally, the circuit size of the parallel 

processor 34 may be reduced- by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 60, as in the foregoing embodiments. 

15 (Example 2) 

FIG. 29 shows the structure of a second 
example of the parallel processor in accordance with the 
fourth embodiment of the present invention. As shown in 
FIG. 29, the parallel processor 35 has the same 

20 structure as the parallel processor 23 shown in FIG. 12, 
in that the instruction issue unit 87 includes a 
conversion unit 132. The structure and operation of the 
conversion unit 132 are the same as the structure and 
operation of the conversion unit 115 shown in FIGS. 10 

2 5 and 11. 
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[0090] 

In the parallel processor 35 of this example, 
the instruction issue unit 87 rearranges basic 
instructions contained in each instruction word supplied 
5 to the instruction fetch unit 61, in accordance with the 
arrangement of the instruction execution unit, and then 
supplies the rearranged basic instructions to the 
instruction execution units. Thus, wires can be 
shortened as a whole, and the operation speed can be 
10 increased. 

Additionally, the circuit size of the parallel 
processor 35 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
15 unit 61, as in the foregoing examples - 
(Example 3) 

FIG- 30 shows the structure of a third example 
of the parallel processor in accordance with the fourth 
embodiment of the present invention. As shown in FIG. 30, 

20 the parallel processor 36 has the same structure as the 
parallel processor 24 shown in FIG. 14. The instruction 
fetch unit 62 of this parallel processor 36 includes a 
first conversion unit 133 that rearranges basic 
instructions contained in each fetched instruction word, 

25 in accordance with the arrangement of the instruction 
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execution units. The instruction issue unit 88 of the 
parallel processor 36 includes a second conversion unit 
134 that further rearranges the basic instructions 
contained in each instruction word supplied from the 
instruction fetch unit 62, in accordance with the 
arrangement of the instruction execution units. 
[0091] 

The first conversion unit 133 performs 
"preprocessing" of the rearrangement of basic 
instructions, and the second conversion unit 134 
performs "postprocessing" of the rearrangement of the 
basic instructions . 

To improve the performance of the parallel 
processor in an actual circuit, the processes in the 
instruction fetch unit 62 and the instruction issue unit 
88 are pipelined- Because of that, the difference in 
processing time between instruction fetch unit 62 and 
the instruction issue unit 88 should be as small as 
possible to optimize the pipeline effects. Therefore, 
the arrangement process is divided into the 
"preprocessing" and "postprocessing", so that the 
difference in processing time between the instruction 
fetch unit 62 and the instruction issue unit 88 is small. 
[0092] 

By the parallel processor 36 of this example 
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having the above structure, wires can be shortened as a 
whole, and the operation speed can be increased. 

Additionally, the circuit size of the parallel 
processor 36 may be reduced by restricting in advance 
5 the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 62, as in the foregoing examples. 
(Example 4) 

FIG. 31 shows the structure of a fourth 

10 example of the parallel processor in accordance with the 
fourth embodiment of the present invention. As shown in 
FIG. 31, the parallel processor 37 has the same 
structure as the parallel processor 25 shown in FIG. 16, 
in that the instruction fetch unit 63 includes a 

15 conversion unit 135 and the instruction issue unit 89 
includes a judgment unit 110. 
[0093] 

The structure and operation of the conversion 
unit 135 are the same as the structure and operation of 
20 the conversion unit 115 shown in FIGS. 10 and 11. On the 
other hand, the structure and operation of the judgment 
unit 110 are the same as the judgment unit 103 shown in 
FIG. 6. 

By the parallel processor 37 of this example 
25 having the above structure, the same effects as obtained 
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by the parallel processor 25 of Example 4 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 89 including the judgment unit 
110 enables accurate and efficient parallel processing 
5 of basic instructions, thereby increasing the 

reliability of the operation. The instruction fetch unit 
63 including the conversion unit 135 facilitates the 
issuance of basic instructions from the instruction 
issue unit 89 to the instruction execution units, 
10 thereby increasing the operation speed. 
[0094] 

Additionally, the circuit size of the parallel 
processor 37 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
15 instruction word to be supplied to the instruction fetch 
unit 63, as in the foregoing examples. 
(Example 5) 

FIG. 32 shows the structure of a fifth example 
of the parallel processor in accordance with the fourth 

20 embodiment of the present invention. As shown in FIG. 32, 
the parallel processor 38 has the same structure as the 
parallel processor 26 of Example 5 of the second 
embodiment shown in FIG. 18, in that the instruction 
issue unit 90 includes a conversion unit 136 and a 

2 5 judgment unit 111. 
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[0095] 

The structure and operation of the conversion 
unit 136 are the same as the structure and operation of 
the conversion unit 115 shown in FIGS. 10 and 11. On the 
5 other hand, the structure and operation of the judgment 
unit 111 are the same as the structure and operation of 
the judgment unit 103 shown in FIG. 6. 

By the parallel processor of this example 
having the above structure, the same effects as obtained 

10 by the parallel processor 26 of Example 5 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 90 including the judgment unit 
111 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 

15 reliability of the operation. The instruction issue unit 
90 further including the conversion unit 136 facilitates 
the issuance of basic instruction to the instruction 
execution units, thereby increasing the operation speed. 
[0096] 

20 Additionally, the circuit size of the parallel 

processor 38 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 64, as in the foregoing examples . 

25 (Example 6) 
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FIG. 33 shows the structure of a sixth example 
of the parallel processor in accordance with the fourth 
embodiment of the present invention. As shown in FIG- 33, 
the parallel processor 39 has the same structure as the 
5 parallel processor 27 shown in FIG. 20. 
[0097] 

The structures and operations of a first 
conversion unit 137 and a second conversion unit 138 are 
the same as the structures and operations of the first 
10 conversion unit 117 and the second conversion unit 118 
shown in FIG. 14. On the other hand, the structure and 
operation of the judgment unit 112 are the same as the 
structure and operation of the judgment unit 103 shown 
in FIG. 6. 

15 By the parallel processor 39 of this example 

having the above structure, the same effects as obtained 
by the parallel processor of Example 6 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 91 including the judgment unit 

20 112 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 
65 including the first conversion unit 137 and the 
instruction issue unit 91 further including the second 

25 conversion unit 138 facilitate the issuance of basic 



-67- 



instructions from the instruction issue unit 91 to the 
instruction execution units, thereby increasing the 
operation speed. 
[0098] 

5 Additionally, the circuit size of the parallel 

processor 39 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 65, as in the foregoing examples. 

10 [Fifth Embodiment] 

As shown in FIGS. 34 to 39, parallel 
processors 40 to 45 in accordance with a fifth 
embodiment of the present invention each comprise an 
instruction fetch unit 66-71 connected to the memory 12, 

15 an instruction issue unit 92-97 connected to the 

instruction fetch unit 66-71, instruction execution 
units LUO, LUl, lUO, lUl, FUO, FUl, MUO, MUl, BUO, and 
BUI, and a register unit 102 connected to all the 
instruction execution units. 

20 [0099] 

In the following, the parallel processor in 
accordance with the fifth embodiment of the present 
invention will be described by way of examples in which 
the maximum basic instruction word length contained in 
25 each instruction word is 4. In FIGS. 34 to 39, the 
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maximum basic instruction word length being 4 is 
indicated by four arrows extending from the instruction 
issue unit 66-71 to the instruction issue unit 92-97. 

It should be understood that the maximum basic 
5 instruction word length is not limited to 4 in this 
embodiment . 
(Example 1) 

FIG. 34 shows the structure of a first example 
of the parallel processor in accordance with the fifth 

10 embodiment of the present invention. As shown in FIG. 3.4, 
the parallel processor 40 comprises a conversion unit 
139 in the instruction fetch unit 66. The structure and 
operation of the conversion unit 139 are the same as the 
structure and operation of the conversion unit 115 of 

15 Example 1 of the second embodiment of the present 

invention- The conversion unit 139 rearranges basic 
instructions contained in each fetched instruction word, 
in accordance with the arrangement of the instruction 
execution units, and then supplies the rearranged basic 

20 instructions to the instruction issue unit 92. 
[0100] 

By the parallel processor 40 having the above 
structure, the same effects as obtained by the parallel 
processor 22 of Example 1 of the second embodiment can 
25 be obtained. 
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More specifically, the issuance of basic 
instruction from the instruction issue unit 92 to the 
instruction execution units can be facilitated, and the 
operation speed can be increased accordingly. 
5 [0101] 

Additionally, the circuit size of the parallel 
processor 40 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 

10 unit 66, as in the foregoing emb>odiments . 
(Example 2) 

FIG. 35 shows the structure of a second 
example of the parallel processor in accordance with the 
fifth embodiment of the present invention. As shown in 

15 FIG. 35, the parallel processor 41 has the same 

structure as the parallel processor 23 shown in FIG. 12, 
in that the instruction issue unit 93 includes a 
conversion unit 140. The structure and operation of the 
conversion unit 140 are the same as the structure and 

20 operation of the conversion unit 115 shown in FIGS. 10 
and 11. 
[0102] 

In the parallel processor 41 of this example, 
the instruction issue unit 93 rearranges basic 
25 instructions contained in each instruction word supplied 
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from the instruction fetch unit 67, and then supplies 
the rearranged basic instructions to the instruction 
execution units. Thus, wires can be shortened as a whole 
and the operation speed can be increased. 

Additionally, the circuit size of the parallel 
processor 41 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 57, as in the foregoing examples. 
(Example 3) 

FIG- 36 shows the structure of a third example 
of the parallel processor in accordance with the fifth 
embodiment of the present invention. As shown in FIG. 3 6 
the parallel processor 42 of this example has the same 
structure as the parallel processor 24 shown in FIG. 14. 
The instruction fetch unit 68 of the parallel processor 
42 includes a first conversion , unit 141 that rearranges 
basic instructions contained in each fetched instruction 
word in accordance with the arrangement of the 
instruction execution units. The instruction issue unit 
94 of the parallel processor 42 includes a second 
conversion unit 142 that further rearranges basic 
instructions contained in each instruction word supplied 
from the instruction fetch unit 68 in accordance with 
the arrangement of the instruction execution units. 
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[0103] 

The first conversion unit 141 performs 
"preprocessing" of the rearrangement of basic 
instructions, and the second conversion unit 142 
5 performs "postprocessing" of the rearrangement of the 
basic instructions . 

In order to improve the performance of the 
parallel processor in an actual circuit, the 
processes in the instruction fetch unit 68 and the 

10 instruction issue unit 94 are pipelined. Because of that, 
the difference in processing time between instruction 
fetch unit 68 and the instruction issue unit 94 should 
be as small as possible to optimize the pipeline effects - 
Therefore, the arrangement process is divided into the 

15 "preprocessing" and "postprocessing", so that the 

difference in processing time between the instruction 
fetch unit 68 and the instruction issue unit 94 can be 
small . 
[0104] 

20 By the parallel processor 42 of this example 

having the above structure, wires can be shortened as a 
whole, and the operation speed can be increased. 

Additionally, the circuit size of the parallel 
processor 42 may be reduced by restricting in advance 

25 the arrangement of basic instructions contained in each 
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instruction word to be supplied to the instruction fetch 
unit 68, as in the foregoing examples. 
(Example 4) 

FIG- 37 shows the structure of a fourth 
5 example of the parallel processor in accordance with the 
fifth embodiment of the present invention. As shown in 
FIG. 37, the parallel processor 43 has the same 
structure as the parallel processor 25 shown in FIG. 16, 
in that the instruction fetch unit 69 includes a 
10 conversion unit 143 and the instruction issue unit 95 
includes a judgment unit 113. 
[0105] 

The structure and operation of the conversion 
unit 143 are the same as the structure and operation of 

15 the conversion unit 115 shown in FIGS. 10 and 11. On the 
other hand, the structure and operation of the judgment 
unit 113 are the same as the structure and operation of 
the judgment unit 103 shown in FIG. 6. 

By the parallel processor 43 of this example 

20 having the above structure, the same effects as obtained 
by the parallel processor 25 of Example 4 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 95 including the judgment unit 
113 enables accurate and efficient parallel processing 

25 of basic instructions, thereby increasing the 
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reliability of the operation. The instruction fetch unit 
69 including the conversion unit 143 facilitates the 
issuance of basic instructions from the instruction 
issue unit 95 to the instruction execution units, 
5 thereby increasing the operation speed. 
[0106] 

Additionally, the circuit size of the parallel 
processor 43 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
10 instruction word to be supplied to the instruction fetch 
unit 69, as in the foregoing examples. 
(Example 5) 

FIG. 38 shows the structure of a fifth example 
of the parallel processor in accordance with the fifth 

15 embodiment of the present invention. As shown in FIG. 38, 
the parallel processor 44 of this example has the same 
structure as the parallel processor 26 of Example 5 of 
the second embodiment shown in FIG. 18, in that the 
instruction issue unit 96 includes a conversion unit 144 

20 and a judgment unit 114. 
[0107] 

The structure and operation of the conversion 
unit 144 are the same as the structure and operation of 
the conversion unit 115 shown in FIGS. 10 and 11. On the 
25 other hand, the structure and operation of the judgment 
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unit 114 are the same as the structure and operation of 
the judgment unit 103 shown in FIG. 6. 

By the parallel processor 44 of this example 
having the above structure, the same effects as obtained 
5 by the parallel processor 26 of Example 5 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 96 including the judgment unit 
114 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the 
10 reliability of the operation. The instruction issue unit 
96 further including the conversion unit 144 facilitates 
the issuance of basic instructions to the instruction 
execution units, thereby increasing the operation speed. 
[0108] 

15 Additionally, the circuit size of the parallel 

processor 44 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 70, as in the foregoing examples. 

20 (Example 6) 

FIG. 39 shows the structure of a sixth example 
of the parallel processor in accordance with the fifth 
embodiment of the present invention. As shown in FIG. 39, 
the parallel processor 45 of this example has the same 

25 structure as the parallel processor 27 shown in FIG. 20. 



-75- 



The instruction fetch unit 71 includes a first 
conversion unit 145, and the instruction issue unit 97 
includes a second conversion unit 14 6 and a judgment 
unit 219. 
5 [0109] 

The structures and operations of the first 
conversion unit 145 and the second conversion unit 146 
are the same as the structures and operations of the 
first conversion unit 117 and the second conversion unit 

10 118 shown in FIG. 14. On the other hand, the structure 
and operation of the judgment unit 219 are the same as 
the structure and operation of the judgment unit 103 
shown in FIG. 6. 

By the parallel processor 45 of this example 

15 having the above structure, the same effects as obtained 
by the parallel processor 27 of Example 6 of the second 
embodiment can be obtained- More specifically, the 
instruction issue unit 97 including the judgment unit 
219 enables accurate and efficient parallel processing 

20 of basic instructions, thereby increasing the 

reliability of the operation. The instruction fetch unit 
71 including the first conversion unit 145 and the 
instruction issue unit 97 including the second 
conversion unit 146 facilitate the issuance of basic 

25 instructions from the instruction issue unit 97 to the 



-76- 



instruction execution units, thereby increasing the 
operation speed. 
[0110] 

Additionally, the circuit size of the parallel 
processor 45 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 71, as in the foregoing examples. 

Finally, means to solve the problems to be solved 
by the present inventions is appended below: 

(1) A parallel processor that performs 
parallel processing of one or more basic instructions 
contained in each of instruction words delimited by 
instruction delimiting information, said parallel 
processor characterized by comprising: 

a plurality of instruction execution units that 
perform processes corresponding to supplied basic 
instructions in parallel; 

an instruction fetch unit that fetches the 
instruction words one by one in accordance with the 
instruction delimiting information; and 

an instruction issue unit that selectively issues 
each of the basic instructions supplied from the 
instruction fetch unit to one of the instruction 
execution units to execute an issued basic instruction. 
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(2) The parallel processor according to (1), 



characterized in that the plurality of instruction 



execution units all have the same structure. 



(3) The parallel processor according 



to (1), 



5 characterized in that: 

at least two of the instruction execution units 
have different structures from each other; and 

the instruction fetch unit rearranges the basic 
instructions contained in each of the fetched 
10 instruction words, in accordance with arrangement of the 
plurality of instruction execution units, and then 
supplies the rearranged basic instructions to the 
instruction issue unit. 



15 characterized in that: 

at least two of the instruction execution units 
have different structures from each other; and 

the instruction issue unit rearranges the basic 
instructions contained in each of the instruction words 
20 supplied from the instruction fetch unit, in accordance 
with arrangement of the plurality of instruction 
execution units, and then supplies the rearranged basic 
instructions to the instruction execution units. 



(4) The parallel processor according 



to (1), 



(5) The parallel processor according to (1) , 



25 



characterized in that: 



at least two of the instruction execution units 
have different structures from each other; 

the instruction fetch unit rearranges the basic 
instructions contained in each of the fetched 
instruction words, in accordance with arrangement of the 
instruction execution units, and then supplies the 
rearranged basic instructions to the instruction issue 
unit; and 

the instruction issue unit further rearranges the 
basic instructions contained in each of the instruction 
word supplied from the instruction fetch unit, in 
accordance with the arrangement of the instruction 
execution units, and then supplies the rearranged basic 
instructions the instruction execution units. 

(6) The parallel processor according to any 
one of (3) to (5), characterized in that: 

at least two of the instruction execution units 
have different structures from each other; and 

the instruction fetch unit fetches an instruction 
word that contains basic instruction arranged in advance 
in accordance with the arrangement of the instruction 
execution units. 

According this configuration, since the 
instruction fetch unit fetches an instruction word 
containing basic instruction arranged in advance in 
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accordance with the arrangement of the instruction 
execution units, the circuit size of the parallel 
processor can be reduced. 

(7) . The parallel processor according to any 
5 one of (1) to (6), characterized in that, 

depending on the type of a basic instruction being 
currently executed by one of the instruction execution 
units, the instruction issue unit issues next basic 
instruction before the execution of the basic 
10 instruction being currently executed is completed. 

(8) . The parallel processor according to (7), 
characterized in that, if a supplied basic instruction 
does not have data dependency or control dependency, or 
does not share resources with a basic instruction being 

15 currently executed by one of the instruction execution 
units, the instruction issue unit issues the supplied 
basic instruction before the execution of the basic 
instruction being currently executed is completed. 

According to the configuration, parallel 

20 processing of the basic instructions can be performed 
more reliably and effectively. 
[0111] 

[Advantages of the Invention] 

As explained above, according to the present 
25 inventions, the instruction fetch unit fetches an 
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instruction word one by one in accordance with the 
instruction delimiting information, which makes each 
instruction word length variable, also the instruction 
issue unit issues each of the basic instructions 
contained in the instruction words fetched by the 
instruction fetch unit to one of the corresponding 
instruction execution units so that the basic 
instruction can be contained in the instruction word 
more effectively and capacity of parallel processing by 
the parallel processor can be improved. 
[0112] 

Furthermore, in the above configuration, at 
least two of the instruction execute units has different 
configuration from each other, no excessive circuit 
scale for executing the instruction words is required, 
and downsize of the parallel processor can be achieved. 

In addition, the circuit size of the parallel 
processor can be reduced and faster operation speed can 
be achieved when the instruction fetch unit rearranges 
the basic instructions contained in each of the fetched 
instruction words, in accordance with arrangement of the 
instruction execution units, and then supplies the 
rearranged basic instructions to the instruction issue 
unit . 
[0113] 
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Further, more effective parallel processing 
can be achieved when, depending on the type of a basic 
instruction being currently executed, the instruction 
issue unit issues a next basic instruction before the 
5 execution currently executed is completed. 



[Brief Descripti on of the Drawings] 

FIG. 1 shows the structure of a conventional 
parallel processor; 
10 FIG. 2 shows the formats of instruction words 

to be supplied to a conventional parallel processor 
having four instruction execution units; 

FIG. 3 shows the structure of a first example 
of a parallel processor in accordance with a first 
15 embodiment of the present invention; 

FIG. 4 shows the structures of an instruction 
fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 3; 

FIG. 5 shows the formats of instruction words 
20 to be supplied to the parallel processor of the first 
embodiment of the present invention; 

FIG. 6 shows the structure of a second example 
of the parallel processor in accordance with the first 
embodiment of the present invention; 
25 FIG. 7 shows the structure of a first example 
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of a parallel processor in accordance with a second 
embodiment of the present invention; 

FIG. 8 shows the structures of an instruction 
fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 7; 

FIG. 9 illustrates basic instruction 
rearrangement in the parallel processor of the second 
embodiment of the present invention; 

FIG. 10 is a circuit diagram of a conversion 
unit in the parallel processor shown in FIG. 7; 

FIG. 11 is a circuit diagram of the conversion 
unit in a case where the maximum basic instruction word 
length is 4; 

FIG. 12 shows the structure of a second 
example of the parallel processor in accordance with the 
second embodiment of the present invention; 

FIG. 13 shows the structures of an instruction 
fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 12; 

FIG. 14 shows the structure of a third example 
of the parallel processor in accordance with the second 
embodiment of the present invention; 

FIG. 15 shows the structures of an instruction 
fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 14; 
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FIG, 16 shows the structure of a fourth 
example of the parallel processor in accordance with the 
second embodiment of the present invention; 

FIG. 17 shows the structures of an instruction 
5 fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 16; 

FIG. 18 shows the structure of a fifth example 
of the parallel processor in accordance with the second 
embodiment of the present invention; 
10 FIG. 19 shows the structures of an instruction 

fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 18; 

FIG. 20 shows the structure of a sixth example 
of the parallel processor in accordance with the second 
15 embodiment of the present invention; 

FIG. 21 shows the structures of an instruction 
fetch unit and an instruction issue unit of the parallel 
processor shown in FIG. 20; 

FIG. 22 shows the structure of a first example 
20 of a parallel processor in accordance with a third 
embodiment of the present invention; 

FIG. 23 shows the structure of a second 
example of the parallel processor in accordance with the 
third embodiment of the present invention; 
25 FIG- 24 shows the structure of a third example 
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of the parallel processor in accordance with the third 
embodiment of the present invention; 

FIG. 25 shows the structure of a fourth 
example of the parallel processor in accordance with the 
5 third embodiment of the present invention; 

FIG. 26 shows the structure of a fifth example 
of the parallel processor in accordance with the third 
embodiment of the present invention; 

FIG. 27 shows the structure of a sixth example 
10 of the parallel processor in accordance with the third 
embodiment of the present invention; 

FIG. 28 shows the structure of a first example 
of a parallel processor in accordance with a fourth 
embodiment of the present invention; 
15 FIG. 29 shows the structure of a second 

example of the parallel processor in accordance with the 
fourth embodiment of the present invention; 

FIG. 30 shows the structure of a third example 
of the parallel processor in accordance with the fourth 
20 embodiment of the present invention; 

FIG. 31 shows the structure of a fourth 
example of the parallel processor in accordance with the 
fourth embodiment of the present invention; 

FIG. 32 shows the structure of a fifth example 
25 of the parallel processor in accordance with the fourth 
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embodiment of the present invention; 

FIG. 33 shows the structure of a sixth example 
of the parallel processor in accordance with the fourth 
embodiment of the present invention; 
5 FIG. 34 shows the structure of a first example 

of a parallel processor in accordance with a fifth 
embodiment of the present invention; 

FIG. 35 shows the structure of a second 
example of the parallel processor in accordance with the 
10 fifth embodiment of the present invention; 

FIG- 36 shows the structure of a third example 
of the parallel processor in accordance with the fifth 
embodiment of the present invention; 

FIG. 37 shows the structure of a fourth 
15 example of the parallel processor in accordance with the 
fifth embodiment of the present invention; 

FIG- 38 shows the structure of a fifth example 
of the parallel processor in accordance with the fifth 
embodiment of the present invention; and 
20 FIG- 39 shows the structure of a sixth example 

of the parallel processor in accordance with the fifth 
embodiment of the present invention. 



25 
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[Description of the Reference Numbers] 
1, 46-71 INSTRUCTION FETCH UNIT 
3, 72-97 INSTRUCTION ISSUE UNIT 
5, 98-102 REGISTER UNIT 
5 7, 12 MEMORY 

10 PARALLEL PROCESSOR 
13, 17 INSTRUCTION WORD FORMAT 
15 INTERFACE 
20-45 PROCESSOR 
10 103-114, 219 JUDGMENT UNIT 
115-146 CONVERSION UNIT 

147, 148 BRANCH INSTRUCTION (BI) DETECTOR BLOCK 
149, 150 FLOATING-POINT ARITHMETIC INSTRUCTION (FI) 
DETECTOR BLOCK 

15 151, 152 INTEGER ARITHMETIC INSTRUCTION (II) DETECTOR 
BLOCK 

153, 154 LOAD STORE INSTRUCTION (LI) DETECTOR BLOCK 

155-162 BUFFER 

163-186 AND GATE 
20 187-198 EXCLUSIVE OR GATE 

199-208 OR GATE 

20 9-218, 355-368 SELECTOR 

300-306 FETCH PROGRAM COUNTER (FPC) 

308-314 INSTRUCTION BUFFER 
25 316-322 CUTTING UNIT 
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324-337 ADDER 

339-345 EXECUTION PROGRAM COUNTER (EPC) 
347-353 INSTRUCTION REGISTER 
370-376 CONTROL UNIT 
5 378-381 AND GATE 

EUO-EUn, LUO, LUl, lUO, lUl, FUO, FUl, MUO, MUl, 
BUO, BUI INSTRUCTION EXECUTION UNIT 
BD1-BD4 BRANCH INSTRUCTION (BI) DETECTOR 
FD1-FD4 FLOATING-POINT ARITHMETIC INSTRUCTION (FI) 
10 DETECTOR 

ID1-ID4 INTEGER ARITHMETIC INSTRUCTION (II) DETECTOR 
LD1-LD4 LOAD STORE INSTRUCTION (LI) DETECTOR 
L1-L4 TRANSMISSION LINE 



